上下文感知和规模不敏感的时间重复计数

论文标题

上下文感知和规模不敏感的时间重复计数

Context-aware and Scale-insensitive Temporal Repetition Counting

论文作者

Zhang, Huaidong, Xu, Xuemiao, Han, Guoqiang, He, Shengfeng

论文摘要

时间重复计数旨在估计给定重复作用的周期数量。现有的深度学习方法假设重复动作是在固定的时间尺度上执行的，这对于现实生活中复杂的重复动作无效。在本文中，我们量身定制了一个环境感知和不敏感的框架，以应对由未知和多样的周期长度引起的重复计数的挑战。我们的方法结合了两个关键的见解：（1）不同动作的循环长度是不可预测的，需要大规模搜索，但是，一旦确定了粗糙的循环长度，可以通过回归克服重复之间的多样性。（2）确定周期长度不仅依赖于视频的简短片段，而且还依赖上下文理解。第一个点是通过粗到五个周期的细化方法实现的。它避免了详尽地搜索视频中所有循环长度的大量计算，而是以层次结构方式传播粗糙的预测以进一步完善。其次，我们为上下文感知的预测提出了双向周期长度估计方法。它是一个回归网络，将两个连续的粗循环作为输入，并预测上一个和下一个重复循环的位置。为了使时间重复计数领域的培训和评估有利，我们构建了一个新的，最大的基准，其中包含526个视频，具有不同的重复性动作。广泛的实验表明，在单个数据集上训练的拟议网络在几个基准测试上都优于最先进的方法，这表明所提出的框架足以捕获跨域中的重复模式。

Temporal repetition counting aims to estimate the number of cycles of a given repetitive action. Existing deep learning methods assume repetitive actions are performed in a fixed time-scale, which is invalid for the complex repetitive actions in real life. In this paper, we tailor a context-aware and scale-insensitive framework, to tackle the challenges in repetition counting caused by the unknown and diverse cycle-lengths. Our approach combines two key insights: (1) Cycle lengths from different actions are unpredictable that require large-scale searching, but, once a coarse cycle length is determined, the variety between repetitions can be overcome by regression. (2) Determining the cycle length cannot only rely on a short fragment of video but a contextual understanding. The first point is implemented by a coarse-to-fine cycle refinement method. It avoids the heavy computation of exhaustively searching all the cycle lengths in the video, and, instead, it propagates the coarse prediction for further refinement in a hierarchical manner. We secondly propose a bidirectional cycle length estimation method for a context-aware prediction. It is a regression network that takes two consecutive coarse cycles as input, and predicts the locations of the previous and next repetitive cycles. To benefit the training and evaluation of temporal repetition counting area, we construct a new and largest benchmark, which contains 526 videos with diverse repetitive actions. Extensive experiments show that the proposed network trained on a single dataset outperforms state-of-the-art methods on several benchmarks, indicating that the proposed framework is general enough to capture repetition patterns across domains.

下载PDF全文

下载文献需遵守相关版权规定

论文标题