命令4个自动驾驶汽车（C4AV）研讨会摘要

论文标题

命令4个自动驾驶汽车（C4AV）研讨会摘要

Commands 4 Autonomous Vehicles (C4AV) Workshop Summary

论文作者

Deruyttere, Thierry, Vandenhende, Simon, Grujicic, Dusan, Liu, Yu, Van Gool, Luc, Blaschko, Matthew, Tuytelaars, Tinne, Moens, Marie-Francine

论文摘要

视觉接地的任务需要在自然语言查询中找到图像中最相关的区域或对象。到目前为止，这项任务的进展主要是在策划的数据集上衡量的，这些数据集并不总是代表人类口语的代表。在这项工作中，我们偏离了最近的流行任务设置，并在自动驾驶汽车方案下考虑了问题。特别是，我们考虑了一种情况，乘客可以向可以与街道场景中的物体相关联的车辆自由形式的自然语言命令。为了激发有关此主题的研究，我们基于最近的\ emph {talk2car} dataset（url：https：//wwwww.aicrowd.com/challenges/challenges/ChallenGes/eccv-2020-comcv-2020-commands-4-4-4-4--autson---aicts-），组织了\ emph {自动驾驶}（C4AV）挑战。本文提出了挑战的结果。首先，我们将使用的基准测试与现有数据集进行比较。其次，我们确定了使得最佳模型成功的方面，并将它们与现有的视觉接地最新模型联系起来，此外还通过评估精心选择的子集来检测潜在的故障案例。最后，我们讨论了未来工作的几种可能性。

The task of visual grounding requires locating the most relevant region or object in an image, given a natural language query. So far, progress on this task was mostly measured on curated datasets, which are not always representative of human spoken language. In this work, we deviate from recent, popular task settings and consider the problem under an autonomous vehicle scenario. In particular, we consider a situation where passengers can give free-form natural language commands to a vehicle which can be associated with an object in the street scene. To stimulate research on this topic, we have organized the \emph{Commands for Autonomous Vehicles} (C4AV) challenge based on the recent \emph{Talk2Car} dataset (URL: https://www.aicrowd.com/challenges/eccv-2020-commands-4-autonomous-vehicles). This paper presents the results of the challenge. First, we compare the used benchmark against existing datasets for visual grounding. Second, we identify the aspects that render top-performing models successful, and relate them to existing state-of-the-art models for visual grounding, in addition to detecting potential failure cases by evaluating on carefully selected subsets. Finally, we discuss several possibilities for future work.

下载PDF全文

下载文献需遵守相关版权规定

论文标题