论文标题
具有角色引导面具的多头自我注意
Multi-Head Self-Attention with Role-Guided Masks
论文作者
论文摘要
学习单词的有意义的语义表示的艺术是变压器模型及其注意力机制。简而言之,注意机制学会了参加输入分配复发和卷积的特定部分。尽管发现一些学习的注意力头是扮演语言上可解释的角色,但它们可能是多余的或容易出错的。我们提出了一种指导注意力的方法,朝着在先前的工作中确定的角色很重要。我们通过定义特定角色的掩码来限制头部以参加输入的特定部分,从而使不同的头部旨在扮演不同的角色来做到这一点。使用7个不同数据集进行文本分类和机器翻译的实验表明,我们的方法的表现优于竞争性注意力的CNN和RNN基线。
The state of the art in learning meaningful semantic representations of words is the Transformer model and its attention mechanisms. Simply put, the attention mechanisms learn to attend to specific parts of the input dispensing recurrence and convolutions. While some of the learned attention heads have been found to play linguistically interpretable roles, they can be redundant or prone to errors. We propose a method to guide the attention heads towards roles identified in prior work as important. We do this by defining role-specific masks to constrain the heads to attend to specific parts of the input, such that different heads are designed to play different roles. Experiments on text classification and machine translation using 7 different datasets show that our method outperforms competitive attention-based, CNN, and RNN baselines.