论文标题

部分可观测时空混沌系统的无模型预测

Mind the Trade-off: Debiasing NLU Models without Degrading the In-distribution Performance

论文作者

Utama, Prasetya Ajie, Moosavi, Nafise Sadat, Gurevych, Iryna

论文摘要

自然语言理解的模型(NLU)任务通常依赖于数据集的特质偏差,这使它们在训练分布之外的测试案例中变得脆弱。最近,一些提出的证据表明在改善分布性能方面非常有效。但是,当在分布数据中评估模型时,它们的改进是以牺牲性能下降为代价的,该模型包含具有较高多样性的示例。这种看似不可避免的权衡可能不会告诉我们太多关于在分布数据外所示的小子集之外的更广泛示例的推理和理解能力的变化。在本文中,我们通过引入一种称为“信心正规化”的新颖辩护方法来解决这一权衡,该方法不鼓励模型利用偏见,同时使他们能够获得足够的动力来从所有培训示例中学习。我们在三个NLU任务上评估了我们的方法,并表明,与其前身相比,它可以提高分布外数据集的性能(例如,汉斯数据集中的7pp增益),同时保持原始的分布精度。

Models for natural language understanding (NLU) tasks often rely on the idiosyncratic biases of the dataset, which make them brittle against test cases outside the training distribution. Recently, several proposed debiasing methods are shown to be very effective in improving out-of-distribution performance. However, their improvements come at the expense of performance drop when models are evaluated on the in-distribution data, which contain examples with higher diversity. This seemingly inevitable trade-off may not tell us much about the changes in the reasoning and understanding capabilities of the resulting models on broader types of examples beyond the small subset represented in the out-of-distribution data. In this paper, we address this trade-off by introducing a novel debiasing method, called confidence regularization, which discourage models from exploiting biases while enabling them to receive enough incentive to learn from all the training examples. We evaluate our method on three NLU tasks and show that, in contrast to its predecessors, it improves the performance on out-of-distribution datasets (e.g., 7pp gain on HANS dataset) while maintaining the original in-distribution accuracy.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源