论文标题
Effeval:对MT评估指标的效率的全面评估
EffEval: A Comprehensive Evaluation of Efficiency for MT Evaluation Metrics
论文作者
论文摘要
效率是促进包容性和降低环境成本的关键特性,尤其是在LLM时代。在这项工作中,我们对MT评估指标的效率进行了全面评估。我们的方法涉及用更轻的替代方案替换计算密集型变压器,并在LLM表示顶部使用线性和二次近似来对齐算法。我们评估了三个MT数据集的六个(基于参考和参考的)指标,并检查16个轻型变压器。此外,我们通过利用适配器来调查诸如彗星等指标的训练效率。我们的结果表明,(a)Tinybert在质量和效率之间提供了最佳平衡,(b)CPU加速比GPU上的速度更为实质。 (c)WMD近似在降低质量的同时没有提高效率,并且(d)适配器提高了训练效率(有关向后传递速度和内存需求)以及在某些情况下是度量质量。这些发现可以帮助在评估速度和质量之间取得平衡,这对于有效的NLG系统至关重要。此外,我们的研究有助于持续的努力,以优化NLG评估指标,对性能影响最小。据我们所知,我们对迄今为止进行的MT指标的不同方面的最全面分析。
Efficiency is a key property to foster inclusiveness and reduce environmental costs, especially in an era of LLMs. In this work, we provide a comprehensive evaluation of efficiency for MT evaluation metrics. Our approach involves replacing computation-intensive transformers with lighter alternatives and employing linear and quadratic approximations for alignment algorithms on top of LLM representations. We evaluate six (reference-free and reference-based) metrics across three MT datasets and examine 16 lightweight transformers. In addition, we look into the training efficiency of metrics like COMET by utilizing adapters. Our results indicate that (a) TinyBERT provides the optimal balance between quality and efficiency, (b) CPU speed-ups are more substantial than those on GPU; (c) WMD approximations yield no efficiency gains while reducing quality and (d) adapters enhance training efficiency (regarding backward pass speed and memory requirements) as well as, in some cases, metric quality. These findings can help to strike a balance between evaluation speed and quality, which is essential for effective NLG systems. Furthermore, our research contributes to the ongoing efforts to optimize NLG evaluation metrics with minimal impact on performance. To our knowledge, ours is the most comprehensive analysis of different aspects of efficiency for MT metrics conducted so far.