TFCNET：静态无偏的时间推理的时间全连接网络

论文标题

TFCNET：静态无偏的时间推理的时间全连接网络

TFCNet: Temporal Fully Connected Networks for Static Unbiased Temporal Reasoning

论文作者

Zhang, Shiwen

论文摘要

时间推理是视力智能的重要功能。在计算机视觉研究界，通常以视频分类的形式研究时间推理，近年来提出了许多最先进的神经网络结构和数据集基准，尤其是3D CNNS和动力学。但是，最近的一些作品发现，当前的视频分类基准对静态特征有很大的偏见，因此无法准确反映时间建模能力。提出了旨在消除静态偏差的新视频分类基准，并通过对这些新基准测试的实验表明，当前基于夹的3D CNN的表现优于RNN结构和最近的视频变压器。在本文中，我们发现3D CNN及其有效的深度变体（当使用视频级采样策略时）实际上能够通过静态静态的时间推理基准上的大幅度击败RNN和最近的视觉变压器。此外，我们提出了一个时间完全连接的块（TFC块），这是一种有效的有效分量，沿时间维度近似完全连接的层，以获得视频级别的接受场，从而提高了时空推理能力。随着TFC块插入视频级别3D CNN（V3D），我们提出的TFCNET为合成的时间推理基准，Cater和现实世界静态的静态数据集建立了新的最新结果，diving48超过了所有以前的方法。

Temporal Reasoning is one important functionality for vision intelligence. In computer vision research community, temporal reasoning is usually studied in the form of video classification, for which many state-of-the-art Neural Network structures and dataset benchmarks are proposed in recent years, especially 3D CNNs and Kinetics. However, some recent works found that current video classification benchmarks contain strong biases towards static features, thus cannot accurately reflect the temporal modeling ability. New video classification benchmarks aiming to eliminate static biases are proposed, with experiments on these new benchmarks showing that the current clip-based 3D CNNs are outperformed by RNN structures and recent video transformers. In this paper, we find that 3D CNNs and their efficient depthwise variants, when video-level sampling strategy is used, are actually able to beat RNNs and recent vision transformers by significant margins on static-unbiased temporal reasoning benchmarks. Further, we propose Temporal Fully Connected Block (TFC Block), an efficient and effective component, which approximates fully connected layers along temporal dimension to obtain video-level receptive field, enhancing the spatiotemporal reasoning ability. With TFC blocks inserted into Video-level 3D CNNs (V3D), our proposed TFCNets establish new state-of-the-art results on synthetic temporal reasoning benchmark, CATER, and real world static-unbiased dataset, Diving48, surpassing all previous methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题