论文标题
暂时升降机池,以持续手语识别
Temporal Lift Pooling for Continuous Sign Language Recognition
论文作者
论文摘要
合并方法是现代神经网络增加接受场并降低计算成本的必要性。但是,通常使用的手工制作的合并方法,例如,最大池和平均合并,可能无法保留歧视性特征。尽管许多研究人员在空间域中精心设计了各种汇集变体以在这些局限性方面取得很大进展,但很少访问时间方面,直接使用手工制作的方法或这些专业的空间变体可能不是最佳的。在本文中,我们从信号处理中的提升方案中得出了时间升力池(TLP),以智能地逐步绘制了不同时间层次结构的简单样本特征。提升方案将输入信号分配到具有不同频率的各种子兰,这可以看作是不同的时间运动模式。我们的TLP是一个三阶段的过程,它执行信号分解,组件加权和信息融合以生成精致的尺寸尺寸映射。我们选择一个具有长序列的典型时间任务,即连续的手语识别(CSLR)作为验证TLP有效性的测试台。两个大型数据集的实验表明,TLP的表现优于手工制作的方法和专门的空间变体,其边距很大(1.5%),具有相似的计算开销。作为功能强大的功能提取器,TLP在各种数据集上的多个骨架上表现出很大的概括性,并在两个大规模的CSLR数据集上实现了新的最新结果。可视化进一步证明了TLP在校正光泽边界中的机制。代码已发布。
Pooling methods are necessities for modern neural networks for increasing receptive fields and lowering down computational costs. However, commonly used hand-crafted pooling approaches, e.g., max pooling and average pooling, may not well preserve discriminative features. While many researchers have elaborately designed various pooling variants in spatial domain to handle these limitations with much progress, the temporal aspect is rarely visited where directly applying hand-crafted methods or these specialized spatial variants may not be optimal. In this paper, we derive temporal lift pooling (TLP) from the Lifting Scheme in signal processing to intelligently downsample features of different temporal hierarchies. The Lifting Scheme factorizes input signals into various sub-bands with different frequency, which can be viewed as different temporal movement patterns. Our TLP is a three-stage procedure, which performs signal decomposition, component weighting and information fusion to generate a refined downsized feature map. We select a typical temporal task with long sequences, i.e. continuous sign language recognition (CSLR), as our testbed to verify the effectiveness of TLP. Experiments on two large-scale datasets show TLP outperforms hand-crafted methods and specialized spatial variants by a large margin (1.5%) with similar computational overhead. As a robust feature extractor, TLP exhibits great generalizability upon multiple backbones on various datasets and achieves new state-of-the-art results on two large-scale CSLR datasets. Visualizations further demonstrate the mechanism of TLP in correcting gloss borders. Code is released.