论文标题
通过预先计算术语表示,对变压器的有效文档对变压器进行了重新列表
Efficient Document Re-Ranking for Transformers by Precomputing Term Representations
论文作者
论文摘要
经过深思熟虑的变压器网络在各种排名任务中有效,例如问答和临时文件排名。但是,他们的计算费用在实践中认为他们的成本良好。我们提出的方法称为prettr(预先计算变压器术语表示),大大降低了深层变压器网络的查询时间延迟(在Web文档排名上达到42倍加速),使这些网络在实时排名方案中更可实用。具体而言,我们在索引时间(没有查询)的文档项表示的一部分,并在查询时间将其与查询表示形式合并以计算最终排名分数。由于令牌表示的大尺寸,我们还提出了一种有效的方法来通过训练压缩层以匹配注意力评分来减少存储需求。我们的压缩技术可将所需的存储空间降低到95%,并且可以在没有大量降解的情况下进行排名的性能。
Deep pretrained transformer networks are effective at various ranking tasks, such as question answering and ad-hoc document ranking. However, their computational expenses deem them cost-prohibitive in practice. Our proposed approach, called PreTTR (Precomputing Transformer Term Representations), considerably reduces the query-time latency of deep transformer networks (up to a 42x speedup on web document ranking) making these networks more practical to use in a real-time ranking scenario. Specifically, we precompute part of the document term representations at indexing time (without a query), and merge them with the query representation at query time to compute the final ranking score. Due to the large size of the token representations, we also propose an effective approach to reduce the storage requirement by training a compression layer to match attention scores. Our compression technique reduces the storage required up to 95% and it can be applied without a substantial degradation in ranking performance.