将图像蒸馏到无处：反转知识蒸馏多模式机器翻译

论文标题

将图像蒸馏到无处：反转知识蒸馏多模式机器翻译

Distill the Image to Nowhere: Inversion Knowledge Distillation for Multimodal Machine Translation

论文作者

Peng, Ru, Zeng, Yawen, Zhao, Junbo

论文摘要

过去的多模式机器翻译（MMT）通过合并其他对齐视力信息来提升双语设置。但是，多模式数据集的图像必须要求很大程度上阻碍了MMT的开发 - 即它要求[图像，源文本，目标文本]的一致形式。在推论阶段，这种限制通常是麻烦的，尤其是当未提供像正常NMT设置中的对齐图像时。因此，在这项工作中，我们介绍了IKD-MMT，这是一种新型的MMT框架，可通过反转知识蒸馏方案来支持无图像推理阶段。特别是，使用知识蒸馏模块执行多模式特征生成器，该模块直接从（仅）源文本作为输入生成多模式特征。尽管已经有一些以前的作品娱乐了支持机器翻译的无图像推断的可能性，但他们的表演尚未与图像持续的翻译相媲美。在我们的实验中，我们将方法确定为第一种无图像的方法，可以全面竞争甚至超过所有图像固定框架，并在经常使用的Multi30k基准上实现了最新的结果。我们的代码和数据可在以下网址找到：https：//github.com/pengr/ikd-mmt/tree/master ..

Past works on multimodal machine translation (MMT) elevate bilingual setup by incorporating additional aligned vision information. However, an image-must requirement of the multimodal dataset largely hinders MMT's development -- namely that it demands an aligned form of [image, source text, target text]. This limitation is generally troublesome during the inference phase especially when the aligned image is not provided as in the normal NMT setup. Thus, in this work, we introduce IKD-MMT, a novel MMT framework to support the image-free inference phase via an inversion knowledge distillation scheme. In particular, a multimodal feature generator is executed with a knowledge distillation module, which directly generates the multimodal feature from (only) source texts as the input. While there have been a few prior works entertaining the possibility to support image-free inference for machine translation, their performances have yet to rival the image-must translation. In our experiments, we identify our method as the first image-free approach to comprehensively rival or even surpass (almost) all image-must frameworks, and achieved the state-of-the-art result on the often-used Multi30k benchmark. Our code and data are available at: https://github.com/pengr/IKD-mmt/tree/master..

下载PDF全文

下载文献需遵守相关版权规定

论文标题