多标签电影类型分类的多模式方法

论文标题

多标签电影类型分类的多模式方法

A multimodal approach for multi-label movie genre classification

论文作者

Mangolin, Rafael B., Pereira, Rodolfo M., Britto Jr., Alceu S., Silla Jr., Carlos N., Feltrim, Valéria D., Bertolini, Diego, Costa, Yandre M. G.

论文摘要

电影类型的分类是一项具有挑战性的任务，越来越吸引研究人员的注意。在本文中，我们以多模式的方式解决了电影类型的多标签分类。为此，我们创建了一个由预告片视频剪辑，字幕，概述和电影海报组成的数据集，从电影数据库中获取了152,622个电影标题。该数据集经过精心策划和组织，还可以作为这项工作的贡献提供。数据集的每部电影都根据一组18个流派标签标记。我们使用不同类型的描述符，即Mel频率曲线系数，统计频谱描述符，带有频谱图的本地二进制模式，长期术语记忆和卷积神经网络从这些数据中提取特征。使用不同的分类器（例如二进制和ML-KNN）评估描述符。我们还使用晚期融合策略研究了不同分类器/特征的组合，从而获得了令人鼓舞的结果。基于F-评分度量标准，我们的最佳结果是0.628，是通过使用LSTM在概述上创建的分类器的融合以及使用CNN在电影预告片框架上创建的分类器获得的。当考虑AUC-PR度量时，还可以通过组合这些表示形式来实现最佳结果，0.673，但此外，还使用了基于从字幕创建的LSTM的分类器。这些结果证实了基于该应用领域的不同信息来源的分类器之间互补性的存在。据我们所知，这是根据多媒体信息来源进行电影类型分类的多媒体来源开发的最全面的研究。

Movie genre classification is a challenging task that has increasingly attracted the attention of researchers. In this paper, we addressed the multi-label classification of the movie genres in a multimodal way. For this purpose, we created a dataset composed of trailer video clips, subtitles, synopses, and movie posters taken from 152,622 movie titles from The Movie Database. The dataset was carefully curated and organized, and it was also made available as a contribution of this work. Each movie of the dataset was labeled according to a set of eighteen genre labels. We extracted features from these data using different kinds of descriptors, namely Mel Frequency Cepstral Coefficients, Statistical Spectrum Descriptor , Local Binary Pattern with spectrograms, Long-Short Term Memory, and Convolutional Neural Networks. The descriptors were evaluated using different classifiers, such as BinaryRelevance and ML-kNN. We have also investigated the performance of the combination of different classifiers/features using a late fusion strategy, which obtained encouraging results. Based on the F-Score metric, our best result, 0.628, was obtained by the fusion of a classifier created using LSTM on the synopses, and a classifier created using CNN on movie trailer frames. When considering the AUC-PR metric, the best result, 0.673, was also achieved by combining those representations, but in addition, a classifier based on LSTM created from the subtitles was used. These results corroborate the existence of complementarity among classifiers based on different sources of information in this field of application. As far as we know, this is the most comprehensive study developed in terms of the diversity of multimedia sources of information to perform movie genre classification.

下载PDF全文

下载文献需遵守相关版权规定

论文标题