多模式的体验启发了AI创建

论文标题

多模式的体验启发了AI创建

Multi-Modal Experience Inspired AI Creation

论文作者

Cao, Qian, Chen, Xu, Song, Ruihua, Jiang, Hao, Yang, Guang, Cao, Zhao

论文摘要

AI的创作（例如诗歌或歌词产生）吸引了行业和学术社区的越来越多的关注，过去几年中提出了许多有前途的模型。现有方法通常基于单个和独立的视觉或文本信息估算输出。但是，实际上，人类通常会根据自己的经验进行创作，这可能涉及不同的方式并依次相关。为了模拟这种人类能力，在本文中，我们根据人类的经验来定义和解决一个新颖的AI创建问题。更具体地说，我们研究了如何基于顺序多模式信息生成文本。与以前的作品相比，此任务要困难得多，因为设计的模型必须很好地理解和适应不同模式之间的语义，并以顺序的方式有效地将其转化为输出。为了减轻这些困难，我们首先设计了配备有多模式注意网络的多通道序列到序列体系结构。为了获得更有效的优化，我们提出了针对顺序输入量身定制的课程负抽样策略。为了基准这个问题并证明我们的模型的有效性，我们手动标记了一个新的多模式体验数据集。使用此数据集，我们通过将模型与一系列代表性基线进行比较，进行了广泛的实验，我们可以基于自动和以人为中心的指标来证明模型的显着改进。代码和数据可在：\ url {https://github.com/aman-4-real/mmtg}中获得。

AI creation, such as poem or lyrics generation, has attracted increasing attention from both industry and academic communities, with many promising models proposed in the past few years. Existing methods usually estimate the outputs based on single and independent visual or textual information. However, in reality, humans usually make creations according to their experiences, which may involve different modalities and be sequentially correlated. To model such human capabilities, in this paper, we define and solve a novel AI creation problem based on human experiences. More specifically, we study how to generate texts based on sequential multi-modal information. Compared with the previous works, this task is much more difficult because the designed model has to well understand and adapt the semantics among different modalities and effectively convert them into the output in a sequential manner. To alleviate these difficulties, we firstly design a multi-channel sequence-to-sequence architecture equipped with a multi-modal attention network. For more effective optimization, we then propose a curriculum negative sampling strategy tailored for the sequential inputs. To benchmark this problem and demonstrate the effectiveness of our model, we manually labeled a new multi-modal experience dataset. With this dataset, we conduct extensive experiments by comparing our model with a series of representative baselines, where we can demonstrate significant improvements in our model based on both automatic and human-centered metrics. The code and data are available at: \url{https://github.com/Aman-4-Real/MMTG}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题