论文标题
语言信息转换(LIT):一种自动生成对比度集的方法
Linguistically-Informed Transformations (LIT): A Method for Automatically Generating Contrast Sets
论文作者
论文摘要
尽管伯特(Bert)和罗伯塔(Roberta)等大规模预处理的语言模型在分发测试集中取得了超人的性能,但它们的性能遭受分布测试集(例如,在相反的集合中)。建筑物对比度通常会重新征服人类专家注释,这很昂贵且很难大规模创建。在这项工作中,我们提出了一种语言信息的转换(LIT)方法来自动生成对比集,这使从业者能够探索利益的语言现象,并构成了不同的现象。在SNLI和MNLI上实验我们的方法表明,当前验证的语言模型虽然声称包含足够的语言知识,但在我们自动产生的对比度集中挣扎。此外,我们通过在不影响原始数据的情况下涂抹训练数据来提高模型在对比度上的性能。
Although large-scale pretrained language models, such as BERT and RoBERTa, have achieved superhuman performance on in-distribution test sets, their performance suffers on out-of-distribution test sets (e.g., on contrast sets). Building contrast sets often re-quires human-expert annotation, which is expensive and hard to create on a large scale. In this work, we propose a Linguistically-Informed Transformation (LIT) method to automatically generate contrast sets, which enables practitioners to explore linguistic phenomena of interests as well as compose different phenomena. Experimenting with our method on SNLI and MNLI shows that current pretrained language models, although being claimed to contain sufficient linguistic knowledge, struggle on our automatically generated contrast sets. Furthermore, we improve models' performance on the contrast sets by apply-ing LIT to augment the training data, without affecting performance on the original data.