论文标题
使用基于自动编码器的抬起多冲动的无监督多人跟踪
Unsupervised Multiple Person Tracking using AutoEncoder-Based Lifted Multicuts
论文作者
论文摘要
多对象跟踪(MOT)是计算机视觉中的长期任务。基于检测范式跟踪的当前方法需要某种域知识或监督才能将数据正确关联到轨道中。在这项工作中,我们基于视觉特征和最低成本提升的多功能提出了一种无监督的多个对象跟踪方法。我们的方法基于直率的时空提示,这些提示可以从没有超级维森的图像序列中从相邻框架中提取。基于这些提示的聚类使我们能够学习手头跟踪任务所需的外观不变,并训练自动编码器以生成合适的潜在表示。因此,即使在没有可靠的时空特征可以提取可靠的时空距离上,也可以作为跟踪的强大外观提示。我们表明,尽管接受了未经提供的注释而接受培训,但我们的模型在挑战性的MOT基准中为行人跟踪提供了竞争成果。
Multiple Object Tracking (MOT) is a long-standing task in computer vision. Current approaches based on the tracking by detection paradigm either require some sort of domain knowledge or supervision to associate data correctly into tracks. In this work, we present an unsupervised multiple object tracking approach based on visual features and minimum cost lifted multicuts. Our method is based on straight-forward spatio-temporal cues that can be extracted from neighboring frames in an image sequences without superivison. Clustering based on these cues enables us to learn the required appearance invariances for the tracking task at hand and train an autoencoder to generate suitable latent representation. Thus, the resulting latent representations can serve as robust appearance cues for tracking even over large temporal distances where no reliable spatio-temporal features could be extracted. We show that, despite being trained without using the provided annotations, our model provides competitive results on the challenging MOT Benchmark for pedestrian tracking.