MRRC：用R-CNN特征分布组成（FDC）的图像字幕的多角色表示形式解释

论文标题

MRRC：用R-CNN特征分布组成（FDC）的图像字幕的多角色表示形式解释

MRRC: Multiple Role Representation Crossover Interpretation for Image Captioning With R-CNN Feature Distribution Composition (FDC)

论文作者

Sur, Chiranjib

论文摘要

虽然通过机器的图像字幕需要结构化的学习和解释基础，但改进需要以有意义的方式理解和处理多个上下文。这项研究将为上下文组合提供一个新颖的概念，并将影响许多应用程序来处理视觉特征，以等效于对象，活动和事件的描述。我们的体系结构有三个组成部分：特征分布组成（FDC）层注意力，多角色表示交叉（MRRC）注意力层和语言解码器。 FDC层的注意力有助于引起RCNN功能的加权注意力，MRRC注意力层充当中间表示处理，并有助于产生下一个单词，而语言解码器有助于估计句子中下一个可能的单词的可能性。我们证明了FDC，MRRC的有效性，区域对象具有关注和强化学习，以有效学习，从而从图像中产生更好的字幕。我们的模型的性能增强了35.3 \％的先前性能，并根据逻辑，更好的解释性和上下文创建了一种新的标准和理论。

While image captioning through machines requires structured learning and basis for interpretation, improvement requires multiple context understanding and processing in a meaningful way. This research will provide a novel concept for context combination and will impact many applications to deal visual features as an equivalence of descriptions of objects, activities and events. There are three components of our architecture: Feature Distribution Composition (FDC) Layer Attention, Multiple Role Representation Crossover (MRRC) Attention Layer and the Language Decoder. FDC Layer Attention helps in generating the weighted attention from RCNN features, MRRC Attention Layer acts as intermediate representation processing and helps in generating the next word attention, while Language Decoder helps in estimation of the likelihood for the next probable word in the sentence. We demonstrated effectiveness of FDC, MRRC, regional object feature attention and reinforcement learning for effective learning to generate better captions from images. The performance of our model enhanced previous performances by 35.3\% and created a new standard and theory for representation generation based on logic, better interpretability and contexts.

下载PDF全文

下载文献需遵守相关版权规定

论文标题