论文标题
使用Word2Vec对主题的时间分析
Temporal Analysis on Topics Using Word2Vec
论文作者
论文摘要
本研究提出了一种新颖的趋势检测和可视化方法 - 更具体地说,对主题随时间的变化进行建模。如果当前用于识别和可视化趋势的模型仅传达基于用法随机计数的单一单词的普及,那么本研究中的方法说明了主题正在发展的普及和方向。在这种情况下,方向是所选语料库中独特的亚主题。通过使用K-均值聚类和余弦相似性来对群集之间的距离进行建模,从而产生这种趋势。在收敛的情况下,可以推断出整个主题是互惠的(主题之间的令牌,可以互换)。相反,一个不同的场景暗示每个主题的各自的令牌在相同的上下文中都不会找到(彼此之间越来越不同)。该方法对20个新闻集团数据集中存在的各种媒体部门的一组文章进行了测试。
The present study proposes a novel method of trend detection and visualization - more specifically, modeling the change in a topic over time. Where current models used for the identification and visualization of trends only convey the popularity of a singular word based on stochastic counting of usage, the approach in the present study illustrates the popularity and direction that a topic is moving in. The direction in this case is a distinct subtopic within the selected corpus. Such trends are generated by modeling the movement of a topic by using k-means clustering and cosine similarity to group the distances between clusters over time. In a convergent scenario, it can be inferred that the topics as a whole are meshing (tokens between topics, becoming interchangeable). On the contrary, a divergent scenario would imply that each topics' respective tokens would not be found in the same context (the words are increasingly different to each other). The methodology was tested on a group of articles from various media houses present in the 20 Newsgroups dataset.