论文标题
UMASS-FSD如何无意间利用时间偏见
How UMass-FSD Inadvertently Leverages Temporal Bias
论文作者
论文摘要
第一个故事检测描述了在文档流中识别新事件的任务。 UMASS-FSD系统以其在第一层检测比赛中的出色表现而闻名。最近,在研究出版物中,它经常被用作高精度基线。我们是第一个发现UMASS-FSD无意中利用时间偏见的人。有趣的是,发现的偏见对比以前已知的偏见,并且性能明显更好。我们的分析表明,由于处理增量术语统计数据的异常方式,时间遥远文档的贡献增加。我们表明,这种时间偏差的这种形式也适用于其他众所周知的第一层检测系统,在该系统中提高了检测准确性。为了提供更具概括性的结论并证明观察到的偏见不仅是特定实现的人工制品,我们提出了一个故意利用时间距离偏见的模型。我们的模型大大提高了最先进的首个故事检测系统的检测有效性。
First Story Detection describes the task of identifying new events in a stream of documents. The UMass-FSD system is known for its strong performance in First Story Detection competitions. Recently, it has been frequently used as a high accuracy baseline in research publications. We are the first to discover that UMass-FSD inadvertently leverages temporal bias. Interestingly, the discovered bias contrasts previously known biases and performs significantly better. Our analysis reveals an increased contribution of temporally distant documents, resulting from an unusual way of handling incremental term statistics. We show that this form of temporal bias is also applicable to other well-known First Story Detection systems, where it improves the detection accuracy. To provide a more generalizable conclusion and demonstrate that the observed bias is not only an artefact of a particular implementation, we present a model that intentionally leverages a bias on temporal distance. Our model significantly improves the detection effectiveness of state-of-the-art First Story Detection systems.