论文标题

在直接语音翻译中关注的位置

On the Locality of Attention in Direct Speech Translation

论文作者

Alastruey, Belen, Ferrando, Javier, Gállego, Gerard I., Costa-jussà, Marta R.

论文摘要

变形金刚在多个NLP任务中取得了最新的结果。但是,自我注意的机制复杂性随序列长度二次扩展,为涉及长序列的任务造成了障碍,例如在语音域中。在本文中,我们讨论了自我关注对直接语音翻译的有用性。首先,我们在编码器的自我注意力中分析了层的令牌贡献,揭示了局部对角线模式。为了证明可以避免某些注意力的权重,我们建议将标准自我注意力用局部有效的自我发作代替,并根据分析结果设置上下文量。通过这种方法,我们的模型与基线性能相匹配,并通过跳过标准注意力放弃的权重的计算来提高效率。

Transformers have achieved state-of-the-art results across multiple NLP tasks. However, the self-attention mechanism complexity scales quadratically with the sequence length, creating an obstacle for tasks involving long sequences, like in the speech domain. In this paper, we discuss the usefulness of self-attention for Direct Speech Translation. First, we analyze the layer-wise token contributions in the self-attention of the encoder, unveiling local diagonal patterns. To prove that some attention weights are avoidable, we propose to substitute the standard self-attention with a local efficient one, setting the amount of context used based on the results of the analysis. With this approach, our model matches the baseline performance, and improves the efficiency by skipping the computation of those weights that standard attention discards.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源