跨语性文本分类，具有多语言蒸馏和零射的培训

论文标题

跨语性文本分类，具有多语言蒸馏和零射的培训

Cross-Lingual Text Classification with Multilingual Distillation and Zero-Shot-Aware Training

论文作者

Yang, Ziqing, Cui, Yiming, Chen, Zhigang, Wang, Shijin

论文摘要

多语言预训练的语言模型（MPLM）不仅可以处理不同语言的任务，而且还可以表现出令人惊讶的零镜头跨语性转移性。但是，与最先进的单语培训模型相比，MPLM通常无法在丰富的资源语言上实现可比的监督性能。在本文中，我们旨在仅与受监督语言的资源同时改善多语言模型的监督和零拍摄性能。我们的方法是基于通过教师框架从高性能单语模型中转移知识。我们让多语言模型同时从多个单语模型中学习。为了利用模型的跨语言可传递性，我们提出了MBLM（多语言多语言模型），这是一种基于具有多种语言分支的MPLM构建的模型。每个分支是一堆变压器。 MBLM接受了零拍的训练策略的培训，该培训策略鼓励模型从所有分支机构的零击表示的混合中学习。两个跨语性分类任务的结果表明，只有使用任务的监督数据，我们的方法都可以改善MPLM的监督和零摄像性能。

Multilingual pre-trained language models (MPLMs) not only can handle tasks in different languages but also exhibit surprising zero-shot cross-lingual transferability. However, MPLMs usually are not able to achieve comparable supervised performance on rich-resource languages compared to the state-of-the-art monolingual pre-trained models. In this paper, we aim to improve the multilingual model's supervised and zero-shot performance simultaneously only with the resources from supervised languages. Our approach is based on transferring knowledge from high-performance monolingual models with a teacher-student framework. We let the multilingual model learn from multiple monolingual models simultaneously. To exploit the model's cross-lingual transferability, we propose MBLM (multi-branch multilingual language model), a model built on the MPLMs with multiple language branches. Each branch is a stack of transformers. MBLM is trained with the zero-shot-aware training strategy that encourages the model to learn from the mixture of zero-shot representations from all the branches. The results on two cross-lingual classification tasks show that, with only the task's supervised data used, our method improves both the supervised and zero-shot performance of MPLMs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题