改进唇部阅读的培训策略

论文标题

改进唇部阅读的培训策略

Training Strategies for Improved Lip-reading

论文作者

Ma, Pingchuan, Wang, Yujiang, Petridis, Stavros, Shen, Jie, Pantic, Maja

论文摘要

最近，在一系列独立的作品中提出了几种用于隔离单词唇读的培训策略和时间模型。但是，尚未探讨结合最佳策略和调查每个策略的影响的潜力。在本文中，我们系统地研究了最先进的数据增强方法，时间模型和其他培训策略的性能，例如自我验证和使用单词边界指标。我们的结果表明，时间掩盖（TM）是最重要的增强，其次是混合和密集连接的时间卷积网络（DC-TCN）是唇读隔离单词的最佳时间模型。使用自我介绍和单词边界指标也是有益的，但程度较小。上述所有方法的组合导致分类精度为93.4％，这比LRW数据集的当前最新性能的绝对提高了4.6％。通过预先培训其他数据集，可以将性能进一步提高到94.1％。对各种培训策略的错误分析表明，绩效通过提高难以认可词的分类准确性来提高。

Several training strategies and temporal models have been recently proposed for isolated word lip-reading in a series of independent works. However, the potential of combining the best strategies and investigating the impact of each of them has not been explored. In this paper, we systematically investigate the performance of state-of-the-art data augmentation approaches, temporal models and other training strategies, like self-distillation and using word boundary indicators. Our results show that Time Masking (TM) is the most important augmentation followed by mixup and Densely-Connected Temporal Convolutional Networks (DC-TCN) are the best temporal model for lip-reading of isolated words. Using self-distillation and word boundary indicators is also beneficial but to a lesser extent. A combination of all the above methods results in a classification accuracy of 93.4%, which is an absolute improvement of 4.6% over the current state-of-the-art performance on the LRW dataset. The performance can be further improved to 94.1% by pre-training on additional datasets. An error analysis of the various training strategies reveals that the performance improves by increasing the classification accuracy of hard-to-recognise words.

下载PDF全文

下载文献需遵守相关版权规定

论文标题