论文标题
基于关键字的主题建模和关键字选择
Keyword-based Topic Modeling and Keyword Selection
论文作者
论文摘要
通过指定一组关键字来收集某些类型的文档,例如推文。随着时间的流逝,感兴趣的主题变化是有益的,可以动态调整关键字。面临的挑战是,在了解即将到来的文档和基本主题之前,需要在这些主题之前指定这些挑战。未来的话题应该模仿过去感兴趣的话题,但其中应该有一些新颖性。我们开发了一个基于关键字的主题模型,该模型会动态选择一个用于收集未来文档的关键字子集。生成过程首先根据指定的关键字选择关键字,然后选择基础文档。该模型是通过使用变异下限和随机梯度优化来训练的。推理包括找到关键字的子集,其中给定子集该模型预测了未知即将到来的文档的基本主题字矩阵。我们将关键字主题模型与基准模型进行比较,使用对主题模型结合的推文的病毒预测。基于关键字的主题模型的表现优于该复杂的基线模型67%。
Certain type of documents such as tweets are collected by specifying a set of keywords. As topics of interest change with time it is beneficial to adjust keywords dynamically. The challenge is that these need to be specified ahead of knowing the forthcoming documents and the underlying topics. The future topics should mimic past topics of interest yet there should be some novelty in them. We develop a keyword-based topic model that dynamically selects a subset of keywords to be used to collect future documents. The generative process first selects keywords and then the underlying documents based on the specified keywords. The model is trained by using a variational lower bound and stochastic gradient optimization. The inference consists of finding a subset of keywords where given a subset the model predicts the underlying topic-word matrix for the unknown forthcoming documents. We compare the keyword topic model against a benchmark model using viral predictions of tweets combined with a topic model. The keyword-based topic model outperforms this sophisticated baseline model by 67%.