论文标题
以最高的专注于自适应投票收集,以评估特定领域的语义模型
Top-Rank-Focused Adaptive Vote Collection for the Evaluation of Domain-Specific Semantic Models
论文作者
论文摘要
语义模型的特定领域应用的增长,这是由于无监督的嵌入学习算法的最新成就,要求特定于域的评估数据集。在许多情况下,基于内容的推荐人是一个很好的示例,这些模型需要根据其与给定概念的语义相关性对单词或文本进行排名,尤其关注最高等级。在这项工作中,我们为解决这些要求做出了三倍的贡献:(i)基于自适应成对比较,定义了根据可用资源量身定制的基于相关性的评估数据集的构造协议,并在顶级评估中特别准确地进行了优化; (ii)我们定义了适当的指标,即众所周知的排名相关系数的扩展,以通过上述数据集评估语义模型,从而评估了最高等级的更大意义。最后,(iii)我们定义了一个随机传递模型,以模拟语义驱动的成对比较,该模型确认了所提出的数据集构造协议的有效性。
The growth of domain-specific applications of semantic models, boosted by the recent achievements of unsupervised embedding learning algorithms, demands domain-specific evaluation datasets. In many cases, content-based recommenders being a prime example, these models are required to rank words or texts according to their semantic relatedness to a given concept, with particular focus on top ranks. In this work, we give a threefold contribution to address these requirements: (i) we define a protocol for the construction, based on adaptive pairwise comparisons, of a relatedness-based evaluation dataset tailored on the available resources and optimized to be particularly accurate in top-rank evaluation; (ii) we define appropriate metrics, extensions of well-known ranking correlation coefficients, to evaluate a semantic model via the aforementioned dataset by taking into account the greater significance of top ranks. Finally, (iii) we define a stochastic transitivity model to simulate semantic-driven pairwise comparisons, which confirms the effectiveness of the proposed dataset construction protocol.