麦昆：多模式对话查询重写的基准

论文标题

麦昆：多模式对话查询重写的基准

McQueen: a Benchmark for Multimodal Conversational Query Rewrite

论文作者

Yuan, Yifei, Shi, Chen, Wang, Runze, Chen, Liyi, Jiang, Feijun, You, Yuan, Lam, Wai

论文摘要

查询重写的任务旨在将其上下文查询转换为其完全指定的版本，在该版本中，省略于省略者和核心方案根据历史上下文完成并引用后背。尽管取得了很大的进步，但对实际场景对话的付出更少的努力涉及从多种方式中汲取信息。在本文中，我们提出了多模式对话查询重写（MCQR）的任务，该任务在多模式的视觉对话设置下执行查询重写。我们根据手动注释收集一个名为McQueen的大型数据集，其中包含15k的视觉对话和超过80k的查询，其中每个查询都与完全指定的重写版本相关联。此外，对于出现在重写中的实体，我们提供相应的图像框注释。然后，我们使用McQueen数据集对有效解决MCQR任务的最先进方法进行基准测试，该方法基于带有指针生成器的多模式预训练模型。进行大量实验以证明我们的模型对此任务的有效性\脚注{本文的数据集和代码都可以在\ url {https://github.com/yfyuan01/mqr}中提供。

The task of query rewrite aims to convert an in-context query to its fully-specified version where ellipsis and coreference are completed and referred-back according to the history context. Although much progress has been made, less efforts have been paid to real scenario conversations that involve drawing information from more than one modalities. In this paper, we propose the task of multimodal conversational query rewrite (McQR), which performs query rewrite under the multimodal visual conversation setting. We collect a large-scale dataset named McQueen based on manual annotation, which contains 15k visual conversations and over 80k queries where each one is associated with a fully-specified rewrite version. In addition, for entities appearing in the rewrite, we provide the corresponding image box annotation. We then use the McQueen dataset to benchmark a state-of-the-art method for effectively tackling the McQR task, which is based on a multimodal pre-trained model with pointer generator. Extensive experiments are performed to demonstrate the effectiveness of our model on this task\footnote{The dataset and code of this paper are both available in \url{https://github.com/yfyuan01/MQR}

下载PDF全文

下载文献需遵守相关版权规定

论文标题