估计以自我为中心视频的更多相机姿势对于VQ3D至关重要

论文标题

估计以自我为中心视频的更多相机姿势对于VQ3D至关重要

Estimating more camera poses for ego-centric videos is essential for VQ3D

论文作者

Mai, Jinjie, Zhao, Chen, Hamdi, Abdullah, Giancola, Silvio, Ghanem, Bernard

论文摘要

视觉查询3D定位（VQ3D）是EGO4D情节内存基准中的任务。给定以自我为中心的视频，目标是回答“我上次看到对象x的地方？”的形式的查询，在哪里将查询对象x指定为静态图像，答案应该是指向对象x的3D位移向量。但是，当前技术使用天真的方法使用既定的视频相机，从而在低调的情况下估算出较低的pose con consevers poss pose consevers poss poss pose consevers pose consevers a poss pose consevers a poce consevers a pose consevie a pose converial a ai速率（query（query）（query（query）（query car）。我们为我们的工作中充满挑战的自我摄像机姿势估计问题设计了新的管道。此外，我们重新访问当前的VQ3D框架，并根据性能和效率进行优化。结果，我们在VQ3D排行榜上获得了25.8％的前1个总体成功率，这是基线报告的8.7％的两倍。

Visual queries 3D localization (VQ3D) is a task in the Ego4D Episodic Memory Benchmark. Given an egocentric video, the goal is to answer queries of the form "Where did I last see object X?", where the query object X is specified as a static image, and the answer should be a 3D displacement vector pointing to object X. However, current techniques use naive ways to estimate the camera poses of video frames, resulting in a low query with pose (QwP) ratio, thus a poor overall success rate. We design a new pipeline for the challenging egocentric video camera pose estimation problem in our work. Moreover, we revisit the current VQ3D framework and optimize it in terms of performance and efficiency. As a result, we get the top-1 overall success rate of 25.8% on VQ3D leaderboard, which is two times better than the 8.7% reported by the baseline.

下载PDF全文

下载文献需遵守相关版权规定

论文标题