论文标题

通过密集连接的时间卷积网络阅读的唇读

Lip-reading with Densely Connected Temporal Convolutional Networks

论文作者

Ma, Pingchuan, Wang, Yujiang, Shen, Jie, Petridis, Stavros, Pantic, Maja

论文摘要

在这项工作中,我们介绍了密集连接的时间卷积网络(DC-TCN),用于隔离单词的唇部阅读。尽管时间卷积网络(TCN)最近在许多视觉任务中都表现出巨大的潜力,但其接受场的密度不足以模拟唇部阅读场景中的复杂时间动态。为了解决这个问题,我们将密集的连接引入网络,以捕获更强大的时间功能。此外,我们的方法利用挤压和兴奋的块,一种轻巧的注意机制,进一步增强了模型的分类能力。如果没有铃铛和口哨声,我们的DC-TCN方法在野生(LRW)数据集的唇读中的准确性为88.36%,LRW-1000数据集的唇部读数为43.65%,该数据集超过了所有基线方法,并且是两个数据集中的新型目的。

In this work, we present the Densely Connected Temporal Convolutional Network (DC-TCN) for lip-reading of isolated words. Although Temporal Convolutional Networks (TCN) have recently demonstrated great potential in many vision tasks, its receptive fields are not dense enough to model the complex temporal dynamics in lip-reading scenarios. To address this problem, we introduce dense connections into the network to capture more robust temporal features. Moreover, our approach utilises the Squeeze-and-Excitation block, a light-weight attention mechanism, to further enhance the model's classification power. Without bells and whistles, our DC-TCN method has achieved 88.36% accuracy on the Lip Reading in the Wild (LRW) dataset and 43.65% on the LRW-1000 dataset, which has surpassed all the baseline methods and is the new state-of-the-art on both datasets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源