论文标题
小批量尺寸改善了低资源神经MT的训练
Small Batch Sizes Improve Training of Low-Resource Neural MT
论文作者
论文摘要
我们研究了基本的高参数的作用,该参数控制着在低资源环境中进行神经机器翻译的训练:批处理大小。使用理论见解和实验证据,我们反对普遍认为,批量大小应设置为GPU的记忆所允许的大小。我们表明,在低资源环境中,较小的批量尺寸会导致较短的训练时间得分较高,并认为这是由于训练过程中梯度的正规化更好。
We study the role of an essential hyper-parameter that governs the training of Transformers for neural machine translation in a low-resource setting: the batch size. Using theoretical insights and experimental evidence, we argue against the widespread belief that batch size should be set as large as allowed by the memory of the GPUs. We show that in a low-resource setting, a smaller batch size leads to higher scores in a shorter training time, and argue that this is due to better regularization of the gradients during training.