论文标题
检测无形的人
Detecting Invisible People
论文作者
论文摘要
近年来,单眼对象检测和跟踪急剧改善,但依赖一个关键假设:相机可见对象。许多离线跟踪方法方法是通过将对象重新出现后的曲目链接在一起,使用重新识别(REID),从而解决了事后封闭对象。但是,在内是机器人的在线跟踪(例如自动驾驶车辆)基本需要对象持久性,这是在重新出现之前对封闭物体进行推理的能力。在这项工作中,我们重新使用跟踪基准,并为检测不可见对象的任务提出了新的指标,重点关注人们的说明性案例。我们证明,当前的检测和跟踪系统在此任务上的性能急剧恶化。我们介绍了两个关键的创新,以恢复大部分这种性能下降。我们将时间序列中的闭塞物体检测视为短期预测挑战,从动态序列预测带来了熊工具。其次,我们构建了在3D中明确推理的动态模型,利用了最新的单眼深度估计网络产生的观察结果。据我们所知,我们的工作是第一项证明单眼深度估计对跟踪和检测遮挡对象的任务的有效性的作品。在消融中,我们的方法比基线高11.4%,而F1分数的最先进的方法比最先进的方法提高了5.0%。
Monocular object detection and tracking have improved drastically in recent years, but rely on a key assumption: that objects are visible to the camera. Many offline tracking approaches reason about occluded objects post-hoc, by linking together tracklets after the object re-appears, making use of reidentification (ReID). However, online tracking in embodied robotic agents (such as a self-driving vehicle) fundamentally requires object permanence, which is the ability to reason about occluded objects before they re-appear. In this work, we re-purpose tracking benchmarks and propose new metrics for the task of detecting invisible objects, focusing on the illustrative case of people. We demonstrate that current detection and tracking systems perform dramatically worse on this task. We introduce two key innovations to recover much of this performance drop. We treat occluded object detection in temporal sequences as a short-term forecasting challenge, bringing to bear tools from dynamic sequence prediction. Second, we build dynamic models that explicitly reason in 3D, making use of observations produced by state-of-the-art monocular depth estimation networks. To our knowledge, ours is the first work to demonstrate the effectiveness of monocular depth estimation for the task of tracking and detecting occluded objects. Our approach strongly improves by 11.4% over the baseline in ablations and by 5.0% over the state-of-the-art in F1 score.