基于层次的序列到序列学习

论文标题

基于层次的序列到序列学习

Hierarchical Phrase-based Sequence-to-Sequence Learning

论文作者

Wang, Bailin, Titov, Ivan, Andreas, Jacob, Kim, Yoon

论文摘要

我们描述了一种神经传感器，该神经传感器保持标准序列到序列（SEQ2SEQ）模型的灵活性，同时将层次短语纳入训练期间的电感偏差和推断期间的明确约束。我们的方法训练了两个模型：基于括号转导语法的歧视性解析器，其衍生树在层次上层次对齐源和目标短语，而神经seq2Seq模型学会了一单逐一转换对齐的短语。我们使用相同的SEQ2SEQ模型在所有短语尺度上翻译，这将导致两种推理模式：一种模式，其中丢弃了解析器，仅在序列级别上使用SEQ2SEQ组件，而另一个模式将解析器与SEQ2SEQ模型相结合。在后一种模式下进行解码是通过Cube-Prouned CKY算法进行的，该算法更多地参与其中，但可以在推理过程中使用新的翻译规则。我们将模型正式化为源条件的同步语法，并开发出用于训练的有效变异推理算法。当应用于随机初始化和验证的SEQ2SEQ模型的顶部时，我们发现两种推理模式与小型机器翻译基准的基准相比，两种推理模式的性能都很好。

We describe a neural transducer that maintains the flexibility of standard sequence-to-sequence (seq2seq) models while incorporating hierarchical phrases as a source of inductive bias during training and as explicit constraints during inference. Our approach trains two models: a discriminative parser based on a bracketing transduction grammar whose derivation tree hierarchically aligns source and target phrases, and a neural seq2seq model that learns to translate the aligned phrases one-by-one. We use the same seq2seq model to translate at all phrase scales, which results in two inference modes: one mode in which the parser is discarded and only the seq2seq component is used at the sequence-level, and another in which the parser is combined with the seq2seq model. Decoding in the latter mode is done with the cube-pruned CKY algorithm, which is more involved but can make use of new translation rules during inference. We formalize our model as a source-conditioned synchronous grammar and develop an efficient variational inference algorithm for training. When applied on top of both randomly initialized and pretrained seq2seq models, we find that both inference modes performs well compared to baselines on small scale machine translation benchmarks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题