论文标题

有效批处理梁搜索的流媒体方法

A Streaming Approach For Efficient Batched Beam Search

论文作者

Yang, Kevin, Yao, Violet, DeNero, John, Klein, Dan

论文摘要

我们为GPU体系结构上的可变长度解码提出了有效的分批策略。在解码过程中,当候选人终止或根据启发式终止修剪时,我们的流媒体方法会定期“重新填充”批次,然后再进行选定的候选人子集。我们将方法应用于最先进的机器翻译模型上的可变宽度梁搜索。与固定宽度梁搜索基线相比,我们的方法将运行时降低了71%,而与可变宽度基线相比,运行时间降低了17%,同时匹配基线的BLEU。最后,实验表明,我们的方法可以加快在其他域中的解码,例如语义和句法解析。

We propose an efficient batching strategy for variable-length decoding on GPU architectures. During decoding, when candidates terminate or are pruned according to heuristics, our streaming approach periodically "refills" the batch before proceeding with a selected subset of candidates. We apply our method to variable-width beam search on a state-of-the-art machine translation model. Our method decreases runtime by up to 71% compared to a fixed-width beam search baseline and 17% compared to a variable-width baseline, while matching baselines' BLEU. Finally, experiments show that our method can speed up decoding in other domains, such as semantic and syntactic parsing.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源