瞬时在检测卷积神经网络面孔中的重要性

论文标题

瞬时在检测卷积神经网络面孔中的重要性

The Importance of the Instantaneous Phase in Detecting Faces with Convolutional Neural Networks

论文作者

Tapia, Luis Sanchez

论文摘要

卷积神经网络（CNN）提供了处理数字图像和视频的新方法。然而，在计算资源方面，培训CNN的要求极高。同样，对于特定应用，转移学习的标准使用还倾向于需要更多的资源。此外，最终系统倾向于用作难以解释的黑匣子。当前的论文考虑了从AOLME视频数据集检测面孔的问题。 AOLME数据集由一个大型视频集合组成，这些集体互动集合在不受约束的课堂环境中记录。对于论文，每分钟从18个24分钟视频中提取静止的图像框架。然后，将每个视频框架分为9x5块，每个视频框架都有50x50像素。对于19440年的每个块，面部像素的百分比都设定为地面真相。然后将面部检测定义为用于确定每个块面部像素百分比的回归问题。为了测试不同的方法，将12个视频用于培训和验证。其余6个视频用于测试。该论文研究了将瞬时用于基于AOLME块的面部检测应用的影响。为了进行比较，论文比较了基于瞬时相位的频率调制图像的使用，瞬时振幅的使用和原始的灰度图像。为了生成FM和AM输入，论文使用主要的组件分析，旨在减少训练开销，同时保持可解释性。

Convolutional Neural Networks (CNN) have provided new and accurate methods for processing digital images and videos. Yet, training CNNs is extremely demanding in terms of computational resources. Also, for specific applications, the standard use of transfer learning also tends to require far more resources than what may be needed. Furthermore, the final systems tend to operate as black boxes that are difficult to interpret. The current thesis considers the problem of detecting faces from the AOLME video dataset. The AOLME dataset consists of a large video collection of group interactions that are recorded in unconstrained classroom environments. For the thesis, still image frames were extracted at every minute from 18 24-minute videos. Then, each video frame was divided into 9x5 blocks with 50x50 pixels each. For each of the 19440 blocks, the percentage of face pixels was set as ground truth. Face detection was then defined as a regression problem for determining the face pixel percentage for each block. For testing different methods, 12 videos were used for training and validation. The remaining 6 videos were used for testing. The thesis examines the impact of using the instantaneous phase for the AOLME block-based face detection application. For comparison, the thesis compares the use of the Frequency Modulation image based on the instantaneous phase, the use of the instantaneous amplitude, and the original gray scale image. To generate the FM and AM inputs, the thesis uses dominant component analysis that aims to decrease the training overhead while maintaining interpretability.

下载PDF全文

下载文献需遵守相关版权规定

论文标题